Skip to content

Conversation

@dtrawins
Copy link
Collaborator

@dtrawins dtrawins commented Dec 9, 2025

🛠 Summary

CVS-177455
Added tool parser for devstral model.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

DevstralToolParser() = delete;
DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
BaseOutputParser(tokenizer),
argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
Copy link
Collaborator

@dkalinowski dkalinowski Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we ensured that [ARGS] / [TOOL_CALLS] are single tokens, treated as special, not as string, for example [AR and GS]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are a special tokens.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesnt answer my question

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devstral parser is setting requiresStreamingWithSpecialTokens() as true

DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
BaseOutputParser(tokenizer),
argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate if input_ids token count is == 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding

ToolCall toolCall;
std::string tool_name = tokenizer.decode(tool_name_tokens, ov::AnyMap{ov::genai::skip_special_tokens(true)});
if (this->toolSchemas.find(tool_name) == this->toolSchemas.end()) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", tool_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is behavior we havent implemented in other parsers, is it really worth return early? if we return function name that is not part of the toolschemas spec, we might be able to debug it in bfcl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in line with current behavior in other parsers. I wouldn't do that check if it's only for this parser. Either drop it or create a task for alignment of other parsers.

if (pos == 0) {
this->streamContent.clear();
} else {
this->streamContent = this->streamContent.substr(pos + 13); // "[TOOLS_CALLS]" length is 13
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should avoid magic numbers, this way if we change this->streamingParsingToolCallsStartTag to another value, this part will be incorrect

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can look up how Adrian handles that in qwen coder

this->lastProcessedPosition = pos + Qwen3CoderToolParser::PARAMETER_END_TAG.length();

ToolCall toolCall;
toolCall.arguments = arguments;
toolCall.name = this->toolName;
return sendFullDelta(toolCall);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldntg we stream partial function argument chunks? if i understand correctly you send full delta at the end of generation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.

bool requiresStreamingWithSpecialTokens() const {
return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) ||
(toolParser && toolParser->requiresStreamingWithSpecialTokens());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzegla when i implemented it i remember your comment why it should really be && instead of ||, do you remember what was the reason?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:

if (!reasoningParser) {
	return toolParser && toolParser->requiresStreamingWithSpecialTokens();
} else if (!toolParser) {
	return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens();
} else {
	return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
    	(toolParser && toolParser->requiresStreamingWithSpecialTokens());
}

}

TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithSingleToolCall_MissingEndTag) {
std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add test for scenarios with whitespace between the tags? i saw other models often put spaces or new lines before/after the function name

{"{\"", ov::genai::GenerationFinishReason::NONE, std::nullopt},
{"city\":", ov::genai::GenerationFinishReason::NONE, std::nullopt},
{" \"Paris", ov::genai::GenerationFinishReason::NONE, std::nullopt},
// Last chunk is added in the for loop below
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing test for enclosing scenario

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for empty args

DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
BaseOutputParser(tokenizer),
argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding

Comment on lines 31 to 36
const int64_t argsTokenId; // [ARGS]
const int64_t botTokenId; // [TOOL_CALLS]

// in streaming mode we can rely on tags in string format as tokens are not available
const std::string streamingParsingArgsStartTag = "[ARGS]";
const std::string streamingParsingToolCallsStartTag = "[TOOL_CALLS]";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tags/tokens are not specific to streaming, so I think we can drop streamingParsing prefix
Those are variables are about the same thing - please unify naming:
either botToken or toolCallsStart:
toolCallsStartTokenId, toolCallsStartTag or
botTokenId, botTag
and either argsTokenId or ArgsStartTag:
argsStartTokenId, argsStartTag or
argsTokenId, argsTag

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So those variables are used only in streaming mode now, that's why such naming was selected, but I'm not sure about it since it's essentially just a to-string mapping of special tokens that are not limited to streaming.

@dkalinowski , @atobiszei - what's your opinion on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, checking other parsers I see the name convention to be:

    static const std::string parsingStartTag;
    static const std::string parsingStartTag2;
    static const std::string parsingEndTag;
    const std::string parsingStartTag = "<|python_tag|>";
    const std::string parsingEndTag = "";

so following our convention it should be

    static const std::string parsingArgsStartTag; // = "[ARGS]"; <---- and implementation moved to cpp
    staitc const std::string parsingToolCallsStartTag; // = "[TOOL_CALLS]"; <--- and implementation moved to cpp

bool requiresStreamingWithSpecialTokens() const {
return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) ||
(toolParser && toolParser->requiresStreamingWithSpecialTokens());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:

if (!reasoningParser) {
	return toolParser && toolParser->requiresStreamingWithSpecialTokens();
} else if (!toolParser) {
	return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens();
} else {
	return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
    	(toolParser && toolParser->requiresStreamingWithSpecialTokens());
}

EXPECT_EQ(parsedOutput.reasoning, "");
ASSERT_EQ(parsedOutput.toolCalls.size(), 0);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


void DevstralToolParser::parse(ParsedOutput& parsedOutput, const std::vector<int64_t>& generatedTokens) {
std::vector<std::string> tools;
// Parser will consume entire model output only if the first generated token is the beginning of tools token.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not look like this comment is true for this parser

}
}
if (this->internalState == AWAITING_ARGS_TAG) {
// check if [ARGS] tag is present in the chunk and update state accordingly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if [ARGS] tag is present in the chunk and update state accordingly
// check if [ARGS] tag is present in the streamContent and update state accordingly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we check streamContent but it will be the case only if [ARGS] is added in the chunk. Otherwise it would be different state

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, in the line below we check streamContent not a chunk

if (pos != std::string::npos) {
this->internalState = PROCESSING_ARGS;
this->toolName = this->streamContent.substr(0, pos);
if (this->toolSchemas.find(this->toolName) == this->toolSchemas.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the unary part - this check is unique to this parser and I don't think it's a good idea to have different behavior for different parsers. Either remove or create a task for alignment of other parsers.

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", this->toolName);
return std::nullopt;
}
this->streamContent = this->streamContent.substr(pos + 6); // "[ARGS]" length is 6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number

}
}
if (finishReason != ov::genai::GenerationFinishReason::NONE) {
size_t end_pos = this->streamContent.find("</s>");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this token? If it has some significant value for the parsing it should be a member of the parser class like args and tool calls token. Also:

Suggested change
size_t end_pos = this->streamContent.find("</s>");
size_t endPos = this->streamContent.find("</s>");

ToolCall toolCall;
toolCall.arguments = arguments;
toolCall.name = this->toolName;
return sendFullDelta(toolCall);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.

exit /b 1
)

if exist "%~1\%DEVSTRAL_MODEL%\%TOKENIZER_FILE%" (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole section in bat file could be replace with sth like:

:: ===========================
:: Tokenizer-only model calls
:: ===========================

call :EnsureTokenizer "%~1" "%QWEN3_MODEL%"   "%TOKENIZER_FILE%" "Qwen3"
call :EnsureTokenizer "%~1" "%LLAMA3_MODEL%"  "%TOKENIZER_FILE%" "Llama3.1"
call :EnsureTokenizer "%~1" "%HERMES3_MODEL%" "%TOKENIZER_FILE%" "Hermes3"
call :EnsureTokenizer "%~1" "%PHI4_MODEL%"    "%TOKENIZER_FILE%" "Phi-4"
call :EnsureTokenizer "%~1" "%MISTRAL_MODEL%" "%TOKENIZER_FILE%" "Mistral"
call :EnsureTokenizer "%~1" "%GPTOSS_MODEL%"  "%TOKENIZER_FILE%" "GPT-OSS"
call :EnsureTokenizer "%~1" "%DEVSTRAL_MODEL%" "%TOKENIZER_FILE%" "Devstral"

endlocal
exit /b 0

:: =========================================================
:: Function: EnsureTokenizer
:: %1 = base directory
:: %2 = model name
:: %3 = tokenizer file
:: %4 = display name
:: =========================================================
:EnsureTokenizer
if exist "%~1\%~2\%~3" (
  echo Models file %~1\%~2\%~3 exists. Skipping downloading models.
) else (
  echo Downloading tokenizer and detokenizer for %~4 model to %~1\%~2 directory.
  mkdir "%~1\%~2" 2>nul
  convert_tokenizer "%~2" --with_detokenizer -o "%~1\%~2"
  if errorlevel 1 exit /b %errorlevel%
)

if not exist "%~1\%~2\%~3" (
  echo Models file %~1\%~2\%~3 does not exist.
  exit /b 1
)

exit /b 0

Comment on lines +21 to +23
#include "../../../llm/io_processing/base_output_parser.hpp"
#include "../../../llm/io_processing/output_parser.hpp"
#include "../../platform_utils.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "../../../llm/io_processing/base_output_parser.hpp"
#include "../../../llm/io_processing/output_parser.hpp"
#include "../../platform_utils.hpp"
#include "src/llm/io_processing/base_output_parser.hpp"
#include "src/llm/io_processing/output_parser.hpp"
#include "src/platform_utils.hpp"


#include "src/port/rapidjson_document.hpp"

#include "../../../logging.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "../../../logging.hpp"
#include "src/logging.hpp"


#include "../../../logging.hpp"
#include "tool_parser.hpp"
#include "../utils.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "../utils.hpp"
#include "src/utils.hpp"

{"[ARGS]", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"id":"XXXXXXXXX","type":"function","index":0,"function":{"name":"get_weather"}}]}})"},
{"{\"", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}})"},
{"city\":", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":"city\":"}}]}})"},
{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris"}}]}})"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we missing \ before " here?

Suggested change
{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris"}}]}})"},
{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris\"}}]}})"},

@mzegla mzegla requested a review from Copilot January 7, 2026 09:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Devstral model's tool calling capabilities, implementing a custom tool parser that handles the model's specific format: [TOOL_CALLS]tool_name[ARGS]arguments.

Key changes:

  • Implements DevstralToolParser for parsing tool calls in both streaming and non-streaming modes
  • Adds DevstralGenerationConfigBuilder to configure tool-guided generation with Devstral-specific tags
  • Integrates Devstral parser into the output parsing and generation config pipelines
  • Adds comprehensive test coverage for various tool call scenarios

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
windows_prepare_llm_models.bat Adds model download logic for Devstral tokenizer
prepare_llm_models.sh Adds model download logic for Devstral tokenizer (Linux)
src/test/llm/output_parsers/devstral_output_parser_test.cpp Comprehensive test suite for Devstral tool parsing scenarios
src/llm/servable.cpp Adds error handling for API handler creation
src/llm/io_processing/output_parser.hpp Fixes logic for checking streaming requirements when parsers are optional
src/llm/io_processing/output_parser.cpp Registers Devstral tool parser
src/llm/io_processing/generation_config_builder.hpp Registers Devstral generation config builder
src/llm/io_processing/devstral/tool_parser.hpp Header for Devstral tool parsing implementation
src/llm/io_processing/devstral/tool_parser.cpp Core implementation of Devstral tool call parsing logic
src/llm/io_processing/devstral/generation_config_builder.hpp Header for Devstral generation configuration
src/llm/io_processing/devstral/generation_config_builder.cpp Implementation of Devstral-specific generation configuration
src/llm/BUILD Updates build configuration to include Devstral files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +98 to +102
std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";
auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Reasoninig' to 'Reasoning'.

Suggested change
std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";
auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");
std::string testInput = "Reasoning before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";
auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
EXPECT_EQ(parsedOutput.content, "Reasoning before tool call ");

Copilot uses AI. Check for mistakes.
auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot uses AI. Check for mistakes.
}

TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithContentAndSingleToolCall) {
std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}</s>";
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot uses AI. Check for mistakes.
auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot uses AI. Check for mistakes.
}

TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithInvalidOrder) {
std::string testInput = "Reasoninig before tool call [ARGS]example_tool[TOOL_CALLS]{\"arg1\":\"value1\",\"arg2\":42}</s>";
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot uses AI. Check for mistakes.
auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call example_tool{\"arg1\":\"value1\",\"arg2\":42}");
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot uses AI. Check for mistakes.
namespace ovms {

/*
* Phi4GenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for Phi-4 model.
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly refers to Phi4GenerationConfigBuilder instead of DevstralGenerationConfigBuilder. Update the comment to accurately describe the Devstral class.

Suggested change
* Phi4GenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for Phi-4 model.
* DevstralGenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for the Devstral model.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@mzegla mzegla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +107 to +112
// now we need to add string toolCall.arguments to argumentsWrapper under "arguments" key

toolCallsString.SetString(toolCall.arguments.c_str(), wrappedDelta.GetAllocator());
functionObj.AddMember("arguments", toolCallsString, wrappedDelta.GetAllocator());
toolCallObj.AddMember("function", functionObj, wrappedDelta.GetAllocator());
toolCalls.PushBack(toolCallObj, wrappedDelta.GetAllocator());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you mix function name and arguments in one delta? Other parsers follow the pattern where the first delta contains function name and other metadata and arguments are sent in the following deltas.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this particular case is not covered in other parsers - likely due to more verbose format where we don't see whole output provided as one chunk. In such case I'm more open to accepting that change even though it breaks "no arguments in first delta" rule that we followed so far.

*/

this->streamContent += chunk;
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite detailed information. Should it be on DEBUG not on TRACE?

size_t pos = chunk.find(this->streamingParsingToolCallsStartTag);
if (pos != std::string::npos) {
this->internalState = AWAITING_ARGS_TAG;
std::cout << "Found [TOOL_CALLS] tag in chunk."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or switch to use logger

}
}
if (this->internalState == AWAITING_ARGS_TAG) {
// check if [ARGS] tag is present in the chunk and update state accordingly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, in the line below we check streamContent not a chunk

Comment on lines 31 to 36
const int64_t argsTokenId; // [ARGS]
const int64_t botTokenId; // [TOOL_CALLS]

// in streaming mode we can rely on tags in string format as tokens are not available
const std::string streamingParsingArgsStartTag = "[ARGS]";
const std::string streamingParsingToolCallsStartTag = "[TOOL_CALLS]";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
BaseOutputParser(tokenizer),
argsTokenId([&tokenizer, this]() {
// can not use streamingParsingArgsStartTag because object is not initialized yet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not initialized because it is class field, which adds memory overhead, why cant this be static const field? it would be initialized early and it could be used here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that, simply declare fields as

static const std::string streamingEndTag;";

in header file

and then implement in cpp file:

const std::string DevstralParser::streamingEndTag = "</s>";

Comment on lines 68 to 70
if (encoded.get_shape()[0] != 1) {
throw std::runtime_error("[TOOL_CALLS] must be a single token in the tokenizer vocabulary.");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (encoded.get_shape()[0] != 1) {
throw std::runtime_error("[TOOL_CALLS] must be a single token in the tokenizer vocabulary.");
}
if (encoded.get_shape().size() != 1) {
throw std::runtime_error("[TOOL_CALLS] token shape must have 1 dimension");
}
if (encoded.get_shape()[0] != 1) {
throw std::runtime_error("[TOOL_CALLS] must be a single token in the tokenizer vocabulary.");
}

Comment on lines +89 to +90
static const std::string toolCallEndTag = "</s>";
return toolCallEndTag;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const std::string toolCallEndTag = "</s>";
return toolCallEndTag;
return streamingEndTag;

code duplication

*/

this->streamContent += chunk;
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));

size_t pos = chunk.find(this->streamingParsingToolCallsStartTag);
if (pos != std::string::npos) {
this->internalState = AWAITING_ARGS_TAG;
std::cout << "Found [TOOL_CALLS] tag in chunk."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::cout << "Found [TOOL_CALLS] tag in chunk."

if (pos != std::string::npos) {
this->internalState = AWAITING_ARGS_TAG;
std::cout << "Found [TOOL_CALLS] tag in chunk."
<< " Current state: " << this->internalState << std::endl;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< " Current state: " << this->internalState << std::endl;

}
}
if (this->internalState == AWAITING_ARGS_TAG) {
// check if [ARGS] tag is present in the chunk and update state accordingly
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if [ARGS] tag is present in the chunk and update state accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants